One-against-All Weighted Dynamic Time Warping for Language-Independent and Speaker-Dependent Speech Recognition in Adverse Conditions

نویسندگان

  • Xianglilan Zhang
  • Jiping Sun
  • Zhigang Luo
چکیده

Considering personal privacy and difficulty of obtaining training material for many seldom used English words and (often non-English) names, language-independent (LI) with lightweight speaker-dependent (SD) automatic speech recognition (ASR) is a promising option to solve the problem. The dynamic time warping (DTW) algorithm is the state-of-the-art algorithm for small foot-print SD ASR applications with limited storage space and small vocabulary, such as voice dialing on mobile devices, menu-driven recognition, and voice control on vehicles and robotics. Even though we have successfully developed two fast and accurate DTW variations for clean speech data, speech recognition for adverse conditions is still a big challenge. In order to improve recognition accuracy in noisy environment and bad recording conditions such as too high or low volume, we introduce a novel one-against-all weighted DTW (OAWDTW). This method defines a one-against-all index (OAI) for each time frame of training data and applies the OAIs to the core DTW process. Given two speech signals, OAWDTW tunes their final alignment score by using OAI in the DTW process. Our method achieves better accuracies than DTW and merge-weighted DTW (MWDTW), as 6.97% relative reduction of error rate (RRER) compared with DTW and 15.91% RRER compared with MWDTW are observed in our extensive experiments on one representative SD dataset of four speakers' recordings. To the best of our knowledge, OAWDTW approach is the first weighted DTW specially designed for speech data in adverse conditions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robot Arm Performing Writing through Speech Recognition Using Dynamic Time Warping Algorithm

This paper aims to develop a writing robot by recognizing the speech signal from the user. The robot arm constructed mainly for the disabled people who can’t perform writing on their own. Here, dynamic time warping (DTW) algorithm is used to recognize the speech signal from the user. The action performed by the robot arm in the environment is done by reducing the redundancy which frequently fac...

متن کامل

Efficient Speech Recognition System for Isolated Digits

In this paper, an efficient speech recognition system is proposed for speaker-independent isolated digits (0 to 9). Using the Weighted MFCC (WMFCC), low computational overhead is achieved since only 13 weighted MFCC coefficients are used. In order to capture the trends of the extracted features, the local and global features are computed using the Improved Features for Dynamic Time Warping (IFD...

متن کامل

تخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت

The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...

متن کامل

Development of Isolated Word Speech Recognition System

The isolated word speech recognition system based on dynamic time warping (DTW) has been developed. Speaker adaptation is performed using speaker recognition techniques. Vector quantization is used to create reference templates for speaker recognition. Linear predictive coding (LPC) parameters are used as features for recognition. Performance is evaluated using 12 words of Lithuanian language p...

متن کامل

Speaker Recognition Using Gaussian Mixtures Models

Speech signal contains several levels of information. At first it contains information about the spoken message. At second level speech signal also gives information about the speaker identity, his emotional state and so on. The task of speaker recognition can be divided into two parts: speaker identification and speaker verification. Speaker identification is answering the question which one o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014